Project 3 - Part 5

Aligning Images with DL

For part 5, I used the layers up to and including layer3 in ResNet18 as the backbone and then tried quite a few different architectures for the regression network. The custom ImageRegModel described below is setup to use the backbone on both images, concat the output from the backbone together, and then use the regression part of the network in order to get an output of size 2 for the predicted x_shift and y_shift. For the Regression part of the network I tried several different layers and settings and ended up having the best performance by using the layers below. I also added in the dropout in order to increase regularization because my models tended to overfit on the training set. I also used a few suggestions from here: https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html such as disabling the bias in the Conv2d that is followed by the BatchNorm2d in order to speed up training because it was taking a moment for the model to train on my machine. I referenced the docs on pytorch.org quite a bit to understand certain layers and parameters and what impact they have to try to improve overall training.

I used and modified the extract_patches_training.py and extract_patches_testing.py in order to create the training and testing sets of images from the cell images. I also created a extract_patches_validation.py which I used to create a validation set for training in order to track model progress and overfitting and underfitting. The training set contained patches from cell images 0001.000, 0001.001, and 0001.002. The validation set contained patches from cell image 0001.004. And the testing set contained patches from cell images 0001.005 and 0001.006.

I played with different hyperparameters / model architecture quite a bit over ~10 runs in order to find what seemed to work best which I tracked with tensorboard (Those runs are documented in the runs folder). As mentioned above, I did have some trouble with the model overfitting and so I also modified the utils.py RegistrationDatasetLoader to use a passed in transform that included randomized ColorJitter and randomized GaussianBlur. In order to apply the same transformation to both input images, I had to manipulate the RNG state of torch to apply the same transformation to both. I had to look this up specifically because I couldn't find the information searching directly on pytorch.org. This lead me to here: https://discuss.pytorch.org/t/torchvision-transfors-how-to-perform-identical-transform-on-both-image-and-target/10606/2 and then to here: https://github.com/pytorch/vision/issues/9 which had an example close to the bottom of how to use get_rng_state and set_rng_state in order to do that. I also used different parameters for the dataloaders in order to speed up training time by increasing the batch size and pinning memory. I used pytorch.org a lot to go through this information and settings while working on improving my models.

I trained for a full epoch and then evaluated on the validation set after each epoch. Below that you can see the MSELoss on the testing set specifically and also examples of the predicted x,y shift versus the actual x,y shift on images from the testing dataset and also images from the training dataset. You can see that for my model it does a decent job on the training set, but doesn't always generalize as well to the testing set as much as I would like. Some offsets are close and the images look pretty well aligned while other ones are pretty off in the testing set. This project was very interesting and definitely learned a lot on how best to train and test the custom model.

Below here are different training runs with some different hyperparameter settings and the other model as well.

This model overfitted on the training dataset. You can see that the training curve dropped while the validation curve dropped and then went back up as the model continued training. I found in training all these models that they tended to overfit relatively quickly, and so I tried to make changes to combat that. You can see from the displayed transformations below that it doesn't really do a very good job on the testing set and it does an okay job on the training set.

This model has a slightly lower learning rate and performs okay given the other models that were trained. It still definitely overfits on the training dataset.

This model has a lower learning and actually overfits more than any other model on the training set. For example, you can see from the training output images that it does a great job of predicting the x,y shift, but the validation loss continues to grow every single epoch of training and actually has the higher validation loss of all the models I trained. Unsurprisingly it also has the highest loss on the testing dataset.

This is my best performing model on the validation dataset and the testing dataset. It has the same hyperparameters as the first model I trained above, except for different randomness which has a big impact on how well the model performs.

In the images below, you can see that some x, y shift predictions are somewhat close but many are off. The model still does not generalize as well as I would like, but you can see that it is approaching closer to the correct values on the test set as compared to the other trained models.

Below also shows examples from the training set and the predicted x, y shift from the trained model. You can see that it does a pretty good job of predicting the x, y shift. Besides comparing the numbers, it is interesting to look at the corners to see how well they match up with the actual aligned images corners. If I allowed the model to keep training for additional epochs the performance on the training images would likely get better, but it would potentially overfit more on the training set and have worse generalization.

This is a training run on the second model that does not have the MaxPool2d and you can see that it doesn't really train very well even on the training set. This is the worst performing model out of these ones that I trained.